Predictive Modeling from Distributed Datasets

ABSTRACT

Techniques for using data sets for a predictive model are described. According to various implementations, techniques described herein enable different data sets to be used to generate a predictive model, while minimizing the risk that individual data points of the data sets will be exposed by the predictive model. This aids in protecting individual privacy (e.g., protecting personally identifying information for individuals), while enabling robust predictive models to be generated using data sets from a variety of different sources

RELATED APPLICATION

This application claims priority to U.S. provisional application No.62/472,962, filed on 17 Mar. 2017 and titled “Predictive Modeling,” thedisclosure of which is incorporated by reference in its entirety herein.

BACKGROUND

Today's era of “big data” includes different data systems with access totremendous amounts of data of a variety of different types, such asconsumer data, educational data, medical data, social networking data,and so forth. This data can be processed in various ways and utilizedfor different useful purposes. Educational data, for instance, can beanalyzed to identify different trends and outcomes in educationalprocesses to optimize those processes. Medical data can be analyzed toidentify predictive indicators of different medical conditions.Protecting privacy of individuals associated with data, however, is ofparamount importance.

SUMMARY

This Summary is provided to introduce a selection of concepts in asimplified form that are further described below in the DetailedDescription. This Summary is not intended to identify key features oressential features of the claimed subject matter, nor is it intended tobe used as an aid in determining the scope of the claimed subjectmatter.

Techniques for using data sets for a predictive model are described.According to various implementations, techniques described herein enabledifferent data sets to be used to generate a predictive model, whileminimizing the risk that individual data points of the data sets will beexposed by the predictive model or by the process of generating it. Thisaids in protecting individual privacy (e.g., protecting personallyidentifying information for individuals), while enabling robustpredictive models to be generated using data sets from a variety ofdifferent sources.

BRIEF DESCRIPTION OF THE DRAWINGS

The detailed description is described with reference to the accompanyingfigures. In the figures, the left-most digit(s) of a reference numberidentifies the figure in which the reference number first appears. Theuse of the same reference numbers in different instances in thedescription and the figures may indicate similar or identical items.Identical numerals followed by different letters in a reference numbermay refer to difference instances of a particular item.

FIG. 1 is an illustration of an environment in an example implementationthat is operable to employ techniques discussed herein.

FIG. 2 depicts an example implementation scenario for a high leveloverview of predictive model training in accordance with one or moreimplementations.

FIG. 3 depicts an example implementation scenario for predictive modeltraining using distributed hosts in accordance with one or moreimplementations.

FIG. 4 is a flow diagram that describes steps in a method for enabling apredictive model to be generated in accordance with one or moreimplementations.

FIG. 5 is a flow diagram that describes steps in a method for generatinga predictive model in accordance with one or more implementations.

FIG. 6 is a flow diagram that describes steps in a method for enabling apredictive model to be generated using multiple hosts in accordance withone or more implementations.

FIG. 7 is a flow diagram that describes steps in a method for enabling apredictive model to be generated using multiple hosts in accordance withone or more implementations.

FIG. 8 is a flow diagram that describes steps in a method for utilizinga predictive model in accordance with one or more implementations.

FIG. 9 illustrates an example system and computing device as describedwith reference to FIG. 1, which are configured to implementimplementations of techniques described herein.

DETAILED DESCRIPTION

Techniques for using data sets for a predictive model are described.Generally, a predictive model represents a collection of evaluableconditions to which a data set can be applied to determine a possible,predicted outcome. In at least one implementation, a predictive model isa neural network.

According to various implementations, techniques described herein enabledifferent data sets to be used to generate a predictive model, whileminimizing the risk that individual data points of the data sets will beexposed by the predictive model. This aids in protecting individualprivacy (e.g., protecting personally identifying information forindividuals), while enabling robust predictive models to be generatedusing data sets from a variety of different sources.

In example implementations, different data sources with different datasets use their respective data sets as training sets to train a datamodel. As part of the training, the data sources obtain gradient valuesand submit the gradient values to an external system that processes thegradient values to determine optimal ways for training the data model togenerate a predictive model, e.g., a trained neural network. Theexternal system, for example, determines average gradient values basedon a collection of gradient values from different data sources. Further,the external system adds noise to the average gradient values to avoiddirectly or inferentially exposing information about individual datapoints of the local data sets. The noisy gradient values are used tofurther train the data model and generate a trained predictive model.

According to various implementations, data sets used to generate apredictive model can be very large. Thus, techniques described hereinenable local data sources that maintain the data sets to perform variouslocal computations on their large data sets to generate gradient values.The gradient values can then be communicated to an external system thatuses the gradient values to calculate optimum gradient values and addnoise to the optimum gradient values for generating a predictive modelthat protects individual data points from exposure outside theirrespective data sets.

Thus, techniques described herein protect individual and group privacyby reducing the likelihood that individual records of a data set will beexposed when generating a predictive model using the data set. Further,computational and network resources are conserved by enabling local datasources to perform computations of gradients based on their ownrespective data sets, and enabling an external system to use thegradients to generate a predictive model based on the different datasets. The external system, for example, need not process entire largedata sets, but can perform various calculations described herein usingsmaller data sets that summarize the larger data sets.

In the following discussion, an example environment is first describedthat is operable to employ techniques described herein. Next, someexample implementation scenarios are described in accordance with one ormore implementations. Following this, some example procedures aredescribed in accordance with one or more implementations. Finally, anexample system and device are described that are operable to employtechniques discussed herein in accordance with one or moreimplementations. Consider now an example environment in which exampleimplementations may by employed.

FIG. 1 is an illustration of an environment 100 in an exampleimplementation that is operable to employ techniques for using data setsfor a predictive model described herein. Generally, the environment 100includes various devices, services, and networks that enable datacommunication via a variety of different modalities. For instance, theenvironment 100 includes source systems 102 and a host system 104connected to a network 106. Generally, the source systems 102 representdifferent data sources that can provide data for generating predictivemodels. The source systems 102 include various instances of informationsystems that collect and aggregate different types of data, such asmedical information (e.g., patient records, medical statistics, and soforth) from medical institutions, education information from educationalinstitutions, consumer information from enterprise entities, governmentinformation from governmental entities, social networking informationregarding users of different social networking platforms, and so on. Thesource systems 102 may be implemented in various ways, such as servers,server systems, distributed computing systems (e.g., cloud servers),corpnets, and so on. Examples of different implementations of the sourcesystems 102 are described below with reference to the example system900.

The source systems 102 include data sets 108 and local computationmodules (“local modules”) 110. The data sets 108 represent sets ofdifferent types of data, examples of which are described above.Generally, each of the source systems 102 aggregates and maintains itsown respective data set 108. The local modules 110 representfunctionality for performing different sets of computations on the datasets 108 as well as other types of data. As further detailed herein,some forms of computation can be performed locally by the local modules110, while others can be performed at the host system 104.

The host system 104 is representative of functionality to performvarious computations outside of the context of the source systems 102.For instance, the host system 104 can receive data from the sourcesystems 102, and can perform different calculations using the data.Accordingly, the host system 104 includes a multiparty computationmodule (“multiparty module”) 112, which in turn includes a privacymodule 114. In accordance with implementations for using data sets for apredictive model described herein, the multiparty module 112 representsfunctionality for performing various calculations on data received fromthe source systems 102 to generate predictive models 116. Generally, thepredictive models 116 represent statistical models that are generatedbased on attributes of the data sets 108 and that can be used to predictvarious outcomes dependent on input data values. In at least oneimplementation, the predictive models 116 represent different instancesof a neural network.

As further detailed below, cooperation between the source systems 102and the host system 104 enables various attributes of the different datasets 108 to be used to generate the predictive models 116, whileprotecting the raw data from an individual data set 108 from beingexposed (e.g., directly or inferred) across the different source systems102. This enables multiple data sets 108 to be used to generate anindividual predictive model 116 thus increasing a robustness andaccuracy of the individual predictive model 116, while protecting a dataset 108 from one source system 102 from being exposed to a differentsource system 102.

The network 106 is representative of a network that provides the sourcesystems 102 and the host system 104 with connectivity to variousnetworks and/or services, such as the Internet. The network 106 may beimplemented via a variety of different connectivity technologies, suchas broadband cable, digital subscriber line (DSL), wireless cellular,wireless data connectivity (e.g., WiFi™), T-carrier (e.g., T1),Ethernet, and so forth. In at least some implementations, the network106 represents different interconnected wired and wireless networks.

While the source systems 102 and the host system 104 are depicted asbeing remote from one another, it is to be appreciated that in one ormore implementations, one or more of the source systems 102 and the hostsystem 104 may be implemented as part of a single, multifunctionalsystem to perform various aspects of using data sets for a predictivemodel described herein. For instance, in some implementations, the hostsystem 104 can be implemented as a secure hardware environment that islocal to a particular source system 102, but that is protected fromtampering by functionalities outside of the secure hardware environment.

Having described an example environment in which the techniquesdescribed herein may operate, consider now a discussion of some exampleimplementation scenarios for using data sets for a predictive model inaccordance with one or more implementations. The implementationscenarios may be implemented in the environment 100 discussed above, thesystem 900 described below, and/or any other suitable environment.

FIG. 2 depicts an example implementation scenario 200 which represents ahigh level overview of predictive model training in accordance with oneor more implementations. The scenario 200 includes various entities andcomponents introduced above with reference to the environment 100.

In the scenario 200, the host system 104 distributes an initial model202 separately to a source system 102 a and a source system 102 b.Generally, the initial model 202 represents a starting data model thatis subsequently trained according to techniques described herein togenerate a predictive model. The source systems 102 a, 102 b representseparate sources of a data set 108 a and a data set 108 b, respectively.In at least one implementation, the data sets 108 a, 108 b representdifferent respective sets of data of a same type. For instance, the datasets 108 a, 108 b can include medical data, education data, enterprisedata, and so forth.

Continuing with the scenario 200, local modules 110 a, 110 b of thesource systems 102 a, 102 b each perform training operations 204 a, 204b on their respective instances of the initial model 202 and using theirrespective data sets 108 a, 108 b to generate respective gradient values206 a, 206 b. Generally, the training operations 204 a, 204 b can beperformed in a variety of different ways for training a neural network.In this particular example, the training operations 204 a, 204 brepresent a backpropagation technique that is applied to the initialmodels 202 using mini-batches 208 a, 208 b of the respective data sets108 a, 108 b. Consider, for example, that the data sets 108 a, 108 brepresent collections of data records, such as patient records frommedical data. Accordingly, the mini-batches 208 a, 208 b representsubsets of the collections of data records. As further detailed below,generating a trained data model can be implemented as an iterativeprocess with each iteration using a different mini-batch 208 a, 208 b ofthe data sets 108 a, 108 b.

The gradient values 206 a, 206 b generally represent respectivegradients of a loss function utilized as part of the training operations204 a, 204 b. Proceeding with the scenario 200, the source systems 102a, 102 b communicate their respective gradient values 206 a, 206 b tothe multiparty module 112, which processes the gradients 206 a, 206 b togenerate an average gradient 210. An averaging function, for instance,is applied to the gradients 206 a, 206 b to generate the averagegradient 210. The privacy module 114 then processes the average gradient210 to generate a noisy gradient 212. For example, the privacy module114 adds noise to the average gradient 210 to generate the noisygradient 212. Generally, adding noise to the average gradient 210reduces a likelihood that actual data values from the data sets 108 a,108 b can be discovered or inferred from the noisy gradient 212.

The multiparty module 112 then communicates the noisy gradient 212separately to the source systems 102 a, 102 b. The local modules 110 a,110 b on the source systems 102 a, 102 b utilize the noisy gradient 212to perform a training iteration on the initial model 202 to generate anupdated model 214. According to various implementations, this process isrepeated (e.g., for each of the mini-batches 208 a, 208 b) until all ofthe data sets 108 a, 108 b have been evaluated to generate a predictivemodel 116. Generally, the predictive model 116 represents an optimizedversion of the initial model 202 that can be evaluated with a set ofinput data to generate a predicted outcome value or set of values. Thepredictive model 116 may be generated at the host system 104, and/orindividually at the source systems 102 a, 102 b.

FIG. 3 depicts an example implementation scenario 300 which representspredictive model training using distributed hosts in accordance with oneor more implementations. The scenario 300, for instance, represents avariation on the scenario 200 described above.

The scenario 300 includes a host system 104 a and a host system 104 b,which represent different instances of the host system 104 introducedabove. Generally, the host systems 104 a, 104 b represent individualautonomous systems that are able to communicate with one another toperform various aspects of techniques described herein, but that arealso able to protect certain data from being accessible across the hostsystems 104 a, 104 b.

Similarly to the scenario 200, the source systems 102 a, 102 b startwith an initial model 302 and calculate respective gradients 304 a, 304b based on their respective data sets 108 a, 108 b. As mentioned above,the gradients 304 a, 304 b can be calculated using a backpropagationtechnique that is applied to the initial models 302 a, 302 b using themini-batches 208 a, 208 b of the respective data sets 108 a, 108 b.

In the scenario 300, however, the source systems 102 a, 102 b use asecret sharing technique to further enhance the security and privacyaspects of techniques for [title] described herein. Accordingly, thesource system 102 a calculates a perturbation value 306 a and generatesa perturbed gradient 308 a which represents the gradient 304a+perturbation value 306 a. In at least one implementation, theperturbation value 306 a represents a random vector with the samedimensions as the gradient 304 a. The source system 102 a thencommunicates the perturbed gradient 308 a to the host system 104 a, andthe perturbation value 306 a to the host system 104 b.

Similarly, the source system 102 b calculates a perturbation value 306 band generates a perturbed gradient 308 b which represents the gradient304 b+perturbation value 306 b. The source system 102 b thencommunicates the perturbed gradient 308 b to the host system 104 a, andthe perturbation value 306 b to the host system 104 b.

Continuing with the scenario 300, the host systems 104 a, 104 b sum thevalues that they've received from the respective source systems 102 a,102 b. The host system 104 a, for instance, sums the perturbed gradients308 a, 308 b to generate a gradient sum 310. The host system 104 a thenadds noise to the gradient sum 310 to generate a noisy gradient sum 312.

Further, the host system 104 b sums the perturbation values 306 a, 306 bto generate a perturbation sum 314. The host system 104 b then addsnoise to the perturbation sum 314 to generate a noisy perturbation sum316.

The host systems 104 a, 104 b then engage in a cooperative protocol 318using the noisy gradient sum 312 and the noisy perturbation sum 316 togenerate a noisy gradient 320. The cooperative protocol 318, forinstance, represents a secure computation procedure performed betweenthe host systems 104 a, 104 b. In one example implementation, thecooperative protocol 318 is implemented as a garbled circuit protocolusing the noisy gradient sum 312 and the noisy perturbation sum 316 asinputs to generate the noisy gradient 320. Generally, the noisy gradient320 represents an average of the perturbed gradients 308 a, 308 b withnoise added to the data.

Accordingly, the noisy gradient 320 is communicated to the sourcesystems 102 a, 102 b, which use the noisy gradient 320 to update theinitial model 302 to generate an updated model 322. Generally, thisprocess is performed iteratively until a termination criterion isreached, such as when all of the mini-batches 208 a, 208 b have beenevaluated, to obtain the predictive model 116. Thus, the scenario 300illustrates that distributed calculations can be utilized to furtherenhance security of techniques for [title] described herein.

Having discussed some example implementation scenarios, consider now adiscussion of some example procedures in accordance with one or moreimplementations.

The following discussion describes some example procedures for usingdata sets for a predictive model in accordance with one or moreimplementations. The example procedures may be employed in theenvironment 100 of FIG. 1, the system 900 of FIG. 9, and/or any othersuitable environment. The procedures, for instance, represent exampleprocedures for performing the implementation scenarios described above.In at least some implementations, the steps described for the variousprocedures are implemented automatically and independent of userinteraction.

FIG. 4 is a flow diagram that describes steps in a method in accordancewith one or more implementations. The method describes an exampleprocedure for enabling a predictive model to be generated in accordancewith one or more implementations.

Step 400 calculates a gradient value based on a data set applied to aninitial data model. In at least one implementation, a source system 102calculates the gradient value using backpropagation with a data set 108and an initial model as input. As described above, the data set 108 maybe divided into mini-batches, and thus a particular gradient value canbe calculated for a discrete mini-batch.

Step 402 communicates the gradient value to an external service. Asource system 102, for instance, communicates the gradient value to thehost system 104.

Step 404 receives an average gradient value from the external service.For example, a source system 102 receives the average gradient valuefrom the host system 104. Generally, the average gradient valuerepresents an average of multiple gradient values received from multipledifferent source systems 102 and based on multiple different data sets108. Further, the average gradient value is a noisy gradient, i.e., araw average gradient value to which a noise term has been added.

Step 406 applies the average gradient value to the initial data model. Alocal module 110 at a source system 102, for instance, applies theaverage gradient value to an initial model to generate an updated model.For example, the average gradient value is used to update one or moreweight and bias values for the initial model 202.

Step 408 ascertains whether a termination criterion occurs. Generally, atermination criterion represents an event that indicates whether aniterative process of training the data model is to terminate. In atleast one implementation, the termination criterion represents anindication that a set number of mini-batches 208 have been evaluatedaccording to the process described above. In another exampleimplementation, the termination criterion represents an indication thata specified number of iterations through the process have beenperformed. In another example implementation, the termination criterionrepresents an indication that the trained model did not significantlychange for the last few iterations. In another example implementation,the termination criterion represents an indication that the accuracy ofthe model, as tested on some validation set, did not improve or evendeteriorated, over the last few iterations.

If the termination criterion does not occur (“No”), the process returnsto step 400 where additional gradient values are calculated and used toupdate the data model.

If the termination criterion occurs (“Yes”), step 410 obtains apredictive model that represents a trained version of the initial datamodel. The predictive model, for instance, represents a neural networkwhose weights and biases have been trained according to techniques for[title] described herein. In at least one implementation, the predictivemodel can be generated locally at a source system 102 using noisygradient values obtain from the host system 104. Alternatively oradditionally, the predictive model can be received from the host system104. Generally, the predictive model can be used for various purposes,such as predicting an outcome based on a set of input values.

FIG. 5 is a flow diagram that describes steps in a method in accordancewith one or more implementations. The method describes an exampleprocedure for generating a predictive model in accordance with one ormore implementations.

Step 500 receives multiple gradient values from multiple differentsource systems. The host system 104, for instance, receives gradientvalues from multiple different source systems 102.

Step 502 generates an average gradient value from the multiple gradientvalues. Each of the multiple gradient values, for instance, is adifferent value, e.g., a different gradient of a loss functioncalculated at a respective source system 102. Thus, the host system 104averages the different gradient values to obtain an average gradientvalue.

Step 504 adds a noise term to the average gradient value to generate anoisy gradient average. In at least one implementation, the noise termis added as random noise added to the average gradient value, such as aLaplace-distributed random number added to the average gradient value.In at least one implementation, the noisy gradient average can becalculated via interaction between multiple hosts, such as discussedwith reference to the scenario 300. For instance, the noisy gradientaverage can be calculated via a garbled circuits protocol performedbetween the host systems 104 a, 104 b.

Step 506 communicates the noisy gradient average to the multipledifferent source systems. The host system 104, for instance,communicates the noisy gradient average to the multiple different sourcesystems 102.

Step 508 ascertains whether a termination criterion occurs. Differentexamples of a termination criterion are discussed above. If thetermination criterion does not occur (“No”), the process returns to step500. For instance, further gradient values are received and are averagedto generate further noisy gradient averages, which are communicated backto the source systems 102. This process can be performed iteratively toenable the source systems 102 to iteratively train their respective datamodels.

If the termination criterion occurs (“Yes”), step 510 obtains apredictive model trained using the noisy gradient average. In at leastone implementation, the predictive model can be generated locally at thehost system 104, and/or locally at the individual source systems 102.

FIG. 6 is a flow diagram that describes steps in a method in accordancewith one or more implementations. The method describes an exampleprocedure for enabling a predictive model to be generated using multiplehosts in accordance with one or more implementations.

Step 600 calculates a gradient value based on a data set applied to adata model. In at least one implementation, a source system 102calculates the gradient value using backpropagation with a data set 108and the initial model 202 as input.

In at least one implementation, the gradient value is calculated as:

g ^(i)=Σ_(z∈Z) _(i) Clip(C,F′(w _(t) ,z))  Equation 1:

where Z^(i) is the dataset used in the i′^(th) minibatch, C is a boundon a size of the gradient, F is the function of the data model to beoptimized, w_(t) is the current weight vector, and z is an example fromthe current mini-batch. Clip can be calculated as:

$\begin{matrix}{{{{Clip}\mspace{14mu} \left( {C,x} \right)} = {{\min \left( {1,\frac{C}{\left. ||x \right.||}} \right)}x}},} & {{Equation}\mspace{14mu} 2}\end{matrix}$

where x is the vector being calculated for the gradient value.

Step 602 generates a perturbed gradient value based on the gradientvalue and a perturbation value. A source system 102, for instance,generates a perturbation value, and adds the perturbation value to theoriginal gradient value to generate the perturbed gradient value.

In one example, the perturbation value r^(i) is generated as:

r ^(i)←Laplace(b),  Equation 3:

which represents a random vector with the same dimension as g^(i)sampled from the Laplace distribution.

Accordingly, the perturbed gradient value can be generated asg^(i)+r^(i).

Step 604 communicates the perturbed gradient value to a first hostsystem. The source system 102, for instance, communicates the perturbedgradient value to a first host system 104.

Step 606 communicates the perturbation value to a second host system.For example, the source system 102 communicates the perturbation valueto a second host system 104. In at least one implementation, the firsthost system 104 and the second host system 104 represent host systemsthat are physically and/or communicatively remote from one another andthat are protected from mutual access. Alternatively, the first hostsystem 104 and the second host system 104 represent protected portionsof a single larger system, such as different trusted platform modules(TPM) that reside on a single server and/or other computing device.

Step 608 receives an average gradient value from one or more of thefirst host system or the second host system. Generally, the averagegradient value represents a perturbed average gradient value and isbased calculations performed at the different host systems using theperturbed gradient value and the perturbation value, as well as otherperturbed gradient values and perturbation values from other sourcesystems.

Step 610 applies the average gradient value to the data model. Forinstance, a weight value and/or a bias value from the average gradientvalue are applied to update (e.g., train) the data model.

Step 612 ascertains whether a termination criterion occurs. Differentexamples of termination criteria are discussed above. If the terminationcriterion does not occur (“No”), the process returns to step 600. Forinstance, the source system 102 determines a further gradient valuebased on the updated data model, and the process proceeds as indicatedabove using the further gradient value.

If the termination criterion occurs (“Yes”), step 614 obtains apredictive model that represents a trained version of the data model.The predictive model, for instance, is generated locally at the sourcesystem 102 and based on different gradient values received from the hostsystems 104. Alternatively or additionally, the predictive model iscommunicated to the source system 102 from one or more of the hostsystems 104.

FIG. 7 is a flow diagram that describes steps in a method in accordancewith one or more implementations. The method describes an exampleprocedure for enabling a predictive model to be generated using multiplehosts in accordance with one or more implementations. In this particularexample, portions of the method are divided into actions at a first hostsystem and actions at a second host system.

Step 700 receives perturbed gradients representing gradient valuessummed with perturbation values from multiple different source systems.The host system 104 a, for instance, receives the perturbed gradientsfrom different source systems 102.

Step 702 sums the perturbed gradients to generate a gradient sum. Forexample, the host system 104 a sums a set of perturbed gradients togenerate a gradient sum. The gradient sum {tilde over (g)}₁, forinstance is generated as:

{tilde over (g)} ₁=Σ_(i)(g _(i) +r _(i))s mod mC  Equation 4:

In at least one implementation, smod is a symmetric mode operation, suchas calculated as:

x mod C=((x+C)mod 2C)−C  Equation 5:

Step 704 calculates a first seed for a random number generator. The hostsystem 104 a, for example, calculates a seed value s₁.

Step 706 receives perturbation values from the multiple different sourcesystems. For instance, the host system 104 b receives perturbationvalues that were used to generate the perturbed gradients from multipledifferent source systems 102.

Step 708 sums the perturbation values to generate a perturbation sum.The host system 104 b, for example, sums the perturbation values as:

{tilde over (g)} ₂=Σ_(i) r _(i) s mod mC  Equation 6:

Step 710 calculates a second seed for the random number generator. Thehost system 104 b, for example, calculates a seed value s₂.

Step 712 implements a secure computation protocol using the gradientsum, the perturbation sum, the first seed, and the second seed togenerate a noisy average of the gradient values. The host systems 104 a,104 b, for example, interact to perform a secure computation protocolusing these different sets of values. In at least one implementation,the host systems 104 a, 104 b participate in a garbled circuits protocolto compute the noisy average as:

(({tilde over (g)} ₁ −{tilde over (g)} ₂)s mod mC)+Rand_(s) ₁ _(⊕s) ₂(b),  Equation 7:

where b is an arbitrarily defined random noise parameter that is afunction of the required privacy.

Step 714 communicates the noisy average to a source system to enable apredictive model to be trained using the noisy average. One or more ofthe host systems 104 a, 104 b, for instance, communicate the noisygradient 320 to the source systems 102 a, 102 b. Generally, the noisygradient 320 can be used as part of a training step to generate atrained predictive model 116, e.g., a trained neural network.

Step 716 ascertains whether a termination criterion occurs. Differentexamples of termination criteria are discussed above. If the terminationcriterion does not occur (“No”), the process returns to step 700. Forinstance, the host systems 104 a, 104 b receive further gradient valuesfrom the source systems 102 a, 102 b, and the process iterates until atermination criterion occurs.

If the termination criterion occurs (“Yes”), step 718 obtains apredictive model that represents a trained version of an initial datamodel. The predictive model, for instance, is generated locally at thesource systems 102 a, 102 b and based on different noisy gradient valuesreceived from the host systems 104. Alternatively or additionally, thepredictive model is communicated to the source systems 102 a, 102 b fromone or more of the host systems 104 a, 104 b.

Generally, a predictive model generated according to techniques for[title] described herein can be used for various purposes, such aspredicting outcomes based on various input data sets and scenarios.

FIG. 8 is a flow diagram that describes steps in a method in accordancewith one or more implementations. The method describes an exampleprocedure for utilizing a predictive model in accordance with one ormore implementations. The method, for instance, represents acontinuation of one or more of the procedures described above.

Step 800 applies a set of input data to a predictive model. A sourcesystem 102, for example, receives a set of data and uses the set of datato evaluate a predictive model generated according to techniques forusing data sets for a predictive model described herein. In at leastsome implementations, the set of data includes data values that areevaluated using the predictive model.

Step 802 ascertains an output of the predictive model. For instance, thepredictive model provides an output prediction value based on values ofthe input data.

Step 804 performs, by a computing device, an action based on the outputof the predictive model. Generally, the action can take various forms,such as performing different computation tasks based on the output ofthe predictive model. For example, consider that the predictive model isconfigured to provide a prediction of health condition. If the output ofthe predictive model indicates a possible adverse health condition, theaction can include performing an automatic scheduling of a healthprocedure and/or an automatic communication to an individual regardingthe possible adverse health condition.

As another example, consider that the predictive model is configured toprovide a prediction of a possible computer network malfunction. Forinstance, the predictive model can include various conditions and eventsthat are indicative of a potential network failure. Accordingly, theaction can include performing an automated maintenance and/or diagnosticprocedure on the network to attempt to prevent and/or repair a networkmalfunction.

These examples are presented for purpose of illustration only, and it isto be appreciated that predictive models generated and/or trainedaccording to techniques for using data sets for a predictive modeldescribed herein can be used for a variety of different purposes notexpressly discussed in this disclosure.

Thus, techniques for using data sets for a predictive model describedherein provide ways for generating predictive models based on data setsfrom a variety of different sources, while protecting the data used togenerate the predictive models from being exposed to unauthorizedparties. Further, computational resources are conserved by enablinglocal data sources to perform averaging of data points from large datasets, while allowing a centralized service (e.g., a host system 104 orset of host systems 104) to generate predictive models using the locallyaveraged data points.

Having discussed some example procedures, consider now a discussion ofan example system and device in accordance with one or moreimplementations.

FIG. 9 illustrates an example system generally at 900 that includes anexample computing device 902 that is representative of one or morecomputing systems and/or devices that may implement various techniquesdescribed herein. For example, the source systems 102 and/or the hostsystems 104 discussed above with reference to FIG. 1 can be embodied asthe computing device 902. The computing device 902 may be, for example,a server of a service provider, a device associated with the client(e.g., a client device), an on-chip system, and/or any other suitablecomputing device or computing system.

The example computing device 902 as illustrated includes a processingsystem 904, one or more computer-readable media 906, and one or moreInput/Output (I/O) Interfaces 908 that are communicatively coupled, oneto another. Although not shown, the computing device 902 may furtherinclude a system bus or other data and command transfer system thatcouples the various components, one to another. A system bus can includeany one or combination of different bus structures, such as a memory busor memory controller, a peripheral bus, a universal serial bus, and/or aprocessor or local bus that utilizes any of a variety of busarchitectures. A variety of other examples are also contemplated, suchas control and data lines.

The processing system 904 is representative of functionality to performone or more operations using hardware. Accordingly, the processingsystem 904 is illustrated as including hardware element 910 that may beconfigured as processors, functional blocks, and so forth. This mayinclude implementation in hardware as an application specific integratedcircuit or other logic device formed using one or more semiconductors.The hardware elements 910 are not limited by the materials from whichthey are formed or the processing mechanisms employed therein. Forexample, processors may be comprised of semiconductor(s) and/ortransistors (e.g., electronic integrated circuits (ICs)). In such acontext, processor-executable instructions may beelectronically-executable instructions.

The computer-readable media 906 is illustrated as includingmemory/storage 912. The memory/storage 912 represents memory/storagecapacity associated with one or more computer-readable media. Thememory/storage 912 may include volatile media (such as random accessmemory (RAM)) and/or nonvolatile media (such as read only memory (ROM),Flash memory, optical disks, magnetic disks, and so forth). Thememory/storage 912 may include fixed media (e.g., RAM, ROM, a fixed harddrive, and so on) as well as removable media (e.g., Flash memory, aremovable hard drive, an optical disc, and so forth). Thecomputer-readable media 906 may be configured in a variety of other waysas further described below.

Input/output interface(s) 908 are representative of functionality toallow a user to enter commands and information to computing device 902,and also allow information to be presented to the user and/or othercomponents or devices using various input/output devices. Examples ofinput devices include a keyboard, a cursor control device (e.g., amouse), a microphone (e.g., for voice recognition and/or spoken input),a scanner, touch functionality (e.g., capacitive or other sensors thatare configured to detect physical touch), a camera (e.g., which mayemploy visible or non-visible wavelengths such as infrared frequenciesto detect movement that does not involve touch as gestures), and soforth. Examples of output devices include a display device (e.g., amonitor or projector), speakers, a printer, a network card,tactile-response device, and so forth. Thus, the computing device 902may be configured in a variety of ways as further described below tosupport user interaction.

Various techniques may be described herein in the general context ofsoftware, hardware elements, or program modules. Generally, such modulesinclude routines, programs, objects, elements, components, datastructures, and so forth that perform particular tasks or implementparticular abstract data types. The terms “module,” “functionality,”“entity,” and “component” as used herein generally represent software,firmware, hardware, or a combination thereof. The features of thetechniques described herein are platform-independent, meaning that thetechniques may be implemented on a variety of commercial computingplatforms having a variety of processors.

An implementation of the described modules and techniques may be storedon or transmitted across some form of computer-readable media. Thecomputer-readable media may include a variety of media that may beaccessed by the computing device 902. By way of example, and notlimitation, computer-readable media may include “computer-readablestorage media” and “computer-readable signal media.”

“Computer-readable storage media” may refer to media and/or devices thatenable persistent storage of information in contrast to mere signaltransmission, carrier waves, or signals per se. Computer-readablestorage media do not include signals per se. The computer-readablestorage media includes hardware such as volatile and non-volatile,removable and non-removable media and/or storage devices implemented ina method or technology suitable for storage of information such ascomputer readable instructions, data structures, program modules, logicelements/circuits, or other data. Examples of computer-readable storagemedia may include, but are not limited to, RAM, ROM, EEPROM, flashmemory or other memory technology, CD-ROM, digital versatile disks (DVD)or other optical storage, hard disks, magnetic cassettes, magnetic tape,magnetic disk storage or other magnetic storage devices, or otherstorage device, tangible media, or article of manufacture suitable tostore the desired information and which may be accessed by a computer.

“Computer-readable signal media” may refer to a signal-bearing mediumthat is configured to transmit instructions to the hardware of thecomputing device 902, such as via a network. Signal media typically mayembody computer readable instructions, data structures, program modules,or other data in a modulated data signal, such as carrier waves, datasignals, or other transport mechanism. Signal media also include anyinformation delivery media. The term “modulated data signal” means asignal that has one or more of its characteristics set or changed insuch a manner as to encode information in the signal. By way of example,and not limitation, communication media include wired media such as awired network or direct-wired connection, and wireless media such asacoustic, radio frequency (RF), infrared, and other wireless media.

As previously described, hardware elements 910 and computer-readablemedia 906 are representative of instructions, modules, programmabledevice logic and/or fixed device logic implemented in a hardware formthat may be employed in some implementations to implement at least someaspects of the techniques described herein. Hardware elements mayinclude components of an integrated circuit or on-chip system, anapplication-specific integrated circuit (ASIC), a field-programmablegate array (FPGA), a complex programmable logic device (CPLD), and otherimplementations in silicon or other hardware devices. In this context, ahardware element may operate as a processing device that performsprogram tasks defined by instructions, modules, and/or logic embodied bythe hardware element as well as a hardware device utilized to storeinstructions for execution, e.g., the computer-readable storage mediadescribed previously.

Combinations of the foregoing may also be employed to implement varioustechniques and modules described herein. Accordingly, software,hardware, or program modules and other program modules may beimplemented as one or more instructions and/or logic embodied on someform of computer-readable storage media and/or by one or more hardwareelements 910. The computing device 902 may be configured to implementparticular instructions and/or functions corresponding to the softwareand/or hardware modules. Accordingly, implementation of modules that areexecutable by the computing device 902 as software may be achieved atleast partially in hardware, e.g., through use of computer-readablestorage media and/or hardware elements 910 of the processing system. Theinstructions and/or functions may be executable/operable by one or morearticles of manufacture (for example, one or more computing devices 902and/or processing systems 904) to implement techniques, modules, andexamples described herein.

As further illustrated in FIG. 9, the example system 900 enablesubiquitous environments for a seamless user experience when runningapplications on a personal computer (PC), a television device, and/or amobile device. Services and applications run substantially similar inall three environments for a common user experience when transitioningfrom one device to the next while utilizing an application, playing avideo game, watching a video, and so on.

In the example system 900, multiple devices are interconnected through acentral computing device. The central computing device may be local tothe multiple devices or may be located remotely from the multipledevices. In one embodiment, the central computing device may be a cloudof one or more server computers that are connected to the multipledevices through a network, the Internet, or other data communicationlink.

In one embodiment, this interconnection architecture enablesfunctionality to be delivered across multiple devices to provide acommon and seamless experience to a user of the multiple devices. Eachof the multiple devices may have different physical requirements andcapabilities, and the central computing device uses a platform to enablethe delivery of an experience to the device that is both tailored to thedevice and yet common to all devices. In one embodiment, a class oftarget devices is created and experiences are tailored to the genericclass of devices. A class of devices may be defined by physicalfeatures, types of usage, or other common characteristics of thedevices.

In various implementations, the computing device 902 may assume avariety of different configurations, such as for computer 914, mobile916, and television 918 uses. Each of these configurations includesdevices that may have generally different constructs and capabilities,and thus the computing device 902 may be configured according to one ormore of the different device classes. For instance, the computing device902 may be implemented as the computer 914 class of a device thatincludes a personal computer, desktop computer, a multi-screen computer,laptop computer, netbook, and so on.

The computing device 902 may also be implemented as the mobile 916 classof device that includes mobile devices, such as a mobile phone, portablemusic player, portable gaming device, a tablet computer, a wearabledevice, a multi-screen computer, and so on. The computing device 902 mayalso be implemented as the television 918 class of device that includesdevices having or connected to generally larger screens in casualviewing environments. These devices include televisions, set-top boxes,gaming consoles, and so on.

The techniques described herein may be supported by these variousconfigurations of the computing device 902 and are not limited to thespecific examples of the techniques described herein. For example,functionalities discussed with reference to the source systems 102and/or the host systems 104 may be implemented all or in part throughuse of a distributed system, such as over a “cloud” 920 via a platform922 as described below.

The cloud 920 includes and/or is representative of a platform 922 forresources 924. The platform 922 abstracts underlying functionality ofhardware (e.g., servers) and software resources of the cloud 920. Theresources 924 may include applications and/or data that can be utilizedwhile computer processing is executed on servers that are remote fromthe computing device 902. Resources 924 can also include servicesprovided over the Internet and/or through a subscriber network, such asa cellular or Wi-Fi network.

The platform 922 may abstract resources and functions to connect thecomputing device 902 with other computing devices. The platform 922 mayalso serve to abstract scaling of resources to provide a correspondinglevel of scale to encountered demand for the resources 924 that areimplemented via the platform 922. Accordingly, in an interconnecteddevice embodiment, implementation of functionality described herein maybe distributed throughout the system 900. For example, the functionalitymay be implemented in part on the computing device 902 as well as viathe platform 922 that abstracts the functionality of the cloud 920.

Discussed herein are a number of methods that may be implemented toperform techniques discussed herein. Aspects of the methods may beimplemented in hardware, firmware, or software, or a combinationthereof. The methods are shown as a set of steps that specify operationsperformed by one or more devices and are not necessarily limited to theorders shown for performing the operations by the respective blocks.Further, an operation shown with respect to a particular method may becombined and/or interchanged with an operation of a different method inaccordance with one or more implementations. Aspects of the methods canbe implemented via interaction between various entities discussed abovewith reference to the environment 100.

In the discussions herein, various different implementations aredescribed. It is to be appreciated and understood that eachimplementation described herein can be used on its own or in connectionwith one or more other implementations described herein. Further aspectsof the techniques discussed herein relate to one or more of thefollowing implementations.

A system for obtaining a predictive model, the system including: atleast one processor; and one or more computer-readable storage mediaincluding instructions stored thereon that, responsive to execution bythe at least one processor, cause the system perform operationsincluding: calculating a gradient value based on a data set applied to adata model, the gradient value including a weight value calculated forthe data model; communicating the gradient value to an external service;receiving an average gradient value from the external service; applyingthe average gradient value to the data model; and obtaining, based onascertaining that a termination criterion occurs, a predictive modelthat represents a trained version of the data model.

In addition to any of the above described systems, any one orcombination of: wherein said calculating includes using abackpropagation procedure to train the data model using the data set;wherein said calculating includes: dividing the data set into a set ofmini-batches; and calculating the gradient value using a particularmini-batch of the set of mini-batches; wherein said calculatingincludes: dividing the data set into a set of mini-batches; andcalculating the gradient value using a particular mini-batch of the setof mini-batches, wherein the termination criterion includes determiningthat each mini-batch of the set of mini-batches is evaluated to generatea respective gradient value; wherein said applying includes applying theaverage gradient value to update a weight value of the data model;wherein the predictive model includes a neural network trained using theaverage gradient value; wherein the operations further include: applyinga set of input data to the predictive model; ascertaining an output ofthe predictive model; and performing an action based on the output ofthe predictive model.

A computer-implemented method for obtaining a predictive model, themethod including: receiving multiple gradient values from multipledifferent source systems; generating an average gradient value from themultiple gradient values; adding a noise term to the average gradientvalue to generate a noisy gradient average; communicating the noisygradient average to the multiple different source systems; and obtaininga predictive model trained using the noisy gradient average.

In addition to any of the above described methods, any one orcombination of: wherein said adding the noise term includes adding aLaplace-distributed random number to the average gradient value togenerate the noisy gradient average; wherein said adding the noise termincludes performing a garbled circuits protocol using the averagegradient value; wherein the predictive model includes a neural networktrained using the noisy gradient average.

A computer-implemented method for obtaining a predictive model, themethod including: calculating a gradient value based on a data setapplied to a data model; generating a perturbed gradient value based onthe gradient value and a perturbation value; communicating the perturbedgradient value to a first host system; communicating the perturbationvalue to a second host system; receiving an average gradient value fromone or more of the first host system or the second host system, theaverage gradient value calculated based on the perturbed gradient valueand the perturbation value; applying the average gradient value to thedata model; and obtaining a predictive model that represents a trainedversion of the data model, the data model trained at least in part usingthe average gradient value.

In addition to any of the above described methods, any one orcombination of: wherein said calculating includes applyingbackpropagation to the data model and using the data set to calculatethe gradient value; wherein said calculating includes: dividing the dataset into a set of mini-batches; and calculating the gradient value usinga particular mini-batch of the set of mini-batches; wherein saidgenerating the perturbed gradient value includes generating theperturbation value as a random vector, and adding the random vector tothe gradient value to generate the perturbed gradient value; whereinsaid applying includes applying a weight value from the average gradientvalue to the data model; wherein said obtaining is performed in responseto ascertaining that a termination criterion occurs; wherein the averagegradient value is calculated using a garbled circuits protocol; whereinthe predictive model includes a neural network trained using the averagegradient value; further including: applying a set of input data to thepredictive model; ascertaining an output of the predictive model;performing an action based on the output of the predictive model.

Techniques for using data sets for a predictive model are described.Although implementations are described in language specific tostructural features and/or methodological acts, it is to be understoodthat the implementations defined in the appended claims are notnecessarily limited to the specific features or acts described. Rather,the specific features and acts are disclosed as example forms ofimplementing the claimed implementations.

What is claimed is:
 1. A system comprising: at least one processor; andone or more computer-readable storage media including instructionsstored thereon that, responsive to execution by the at least oneprocessor, cause the system perform operations including: calculating agradient value based on a data set applied to a data model, the gradientvalue including a weight value calculated for the data model;communicating the gradient value to an external service; receiving anaverage gradient value from the external service; applying the averagegradient value to the data model; and obtaining, based on ascertainingthat a termination criterion occurs, a predictive model that representsa trained version of the data model.
 2. A system as recited in claim 1,wherein said calculating comprises using a backpropagation procedure totrain the data model using the data set.
 3. A system as recited in claim1, wherein said calculating comprises: dividing the data set into a setof mini-batches; and calculating the gradient value using a particularmini-batch of the set of mini-batches.
 4. A system as recited in claim1, wherein said calculating comprises: dividing the data set into a setof mini-batches; and calculating the gradient value using a particularmini-batch of the set of mini-batches, wherein the termination criterioncomprises determining that each mini-batch of the set of mini-batches isevaluated to generate a respective gradient value.
 5. A system asrecited in claim 1, wherein said applying comprises applying the averagegradient value to update a weight value of the data model.
 6. A systemas recited in claim 1, wherein the predictive model comprises a neuralnetwork trained using the average gradient value.
 7. A system as recitedin claim 1, wherein the operations further include: applying a set ofinput data to the predictive model; ascertaining an output of thepredictive model; and performing an action based on the output of thepredictive model.
 8. A computer-implemented method, comprising:receiving multiple gradient values from multiple different sourcesystems; generating an average gradient value from the multiple gradientvalues; adding a noise term to the average gradient value to generate anoisy gradient average; communicating the noisy gradient average to themultiple different source systems; and obtaining a predictive modeltrained using the noisy gradient average.
 9. A method as described inclaim 8, wherein said adding the noise term comprises adding aLaplace-distributed random number to the average gradient value togenerate the noisy gradient average.
 10. A method as described in claim8, wherein said adding the noise term comprises performing a garbledcircuits protocol using the average gradient value.
 11. A method asdescribed in claim 8, wherein the predictive model comprises a neuralnetwork trained using the noisy gradient average.
 12. Acomputer-implemented method, comprising: calculating a gradient valuebased on a data set applied to a data model; generating a perturbedgradient value based on the gradient value and a perturbation value;communicating the perturbed gradient value to a first host system;communicating the perturbation value to a second host system; receivingan average gradient value from one or more of the first host system orthe second host system, the average gradient value calculated based onthe perturbed gradient value and the perturbation value; applying theaverage gradient value to the data model; and obtaining a predictivemodel that represents a trained version of the data model, the datamodel trained at least in part using the average gradient value.
 13. Amethod as described in claim 12, wherein said calculating comprisesapplying backpropagation to the data model and using the data set tocalculate the gradient value.
 14. A method as described in claim 12,wherein said calculating comprises: dividing the data set into a set ofmini-batches; and calculating the gradient value using a particularmini-batch of the set of mini-batches.
 15. A method as described inclaim 12, wherein said generating the perturbed gradient value comprisesgenerating the perturbation value as a random vector, and adding therandom vector to the gradient value to generate the perturbed gradientvalue.
 16. A method as described in claim 12, wherein said applyingcomprises applying a weight value from the average gradient value to thedata model.
 17. A method as described in claim 12, wherein saidobtaining is performed in response to ascertaining that a terminationcriterion occurs.
 18. A method as described in claim 12, wherein theaverage gradient value is calculated using a garbled circuits protocol.19. A method as described in claim 12, wherein the predictive modelcomprises a neural network trained using the average gradient value. 20.A method as described in claim 12, further comprising: applying a set ofinput data to the predictive model; ascertaining an output of thepredictive model; performing an action based on the output of thepredictive model.